Skip to content

Conversation

prdoyle
Copy link
Contributor

@prdoyle prdoyle commented Sep 10, 2025

Problem

When processing PDFs, Apache Tika calls pdfbox, which creates a MemoryCacheInputStream, which (until JDK 26) uses sun.java2d.Disposer, whose static initializer creates a cleanup thread. Since Disposer is in the java.desktop module, which we exclude from system modules, the entitlement check for manage_threads fails.

Workaround

For the time being, just have the server grant manage_threads to java.desktop. This is a fairly broad grant, so we'd like to find a more surgical solution, but this problem is observed in production right now, so we need a fix.

Stack trace

org.elasticsearch.entitlement.runtime.api.NotEntitledException: component [(server)], module [java.desktop], class [class sun.java2d.Disposer], entitlement [manage_threads]
	at [email protected]/org.elasticsearch.entitlement.runtime.policy.PolicyCheckerImpl.notEntitled(PolicyCheckerImpl.java:467)
	at [email protected]/org.elasticsearch.entitlement.runtime.policy.PolicyCheckerImpl.checkFlagEntitlement(PolicyCheckerImpl.java:443)
	at [email protected]/org.elasticsearch.entitlement.runtime.policy.PolicyCheckerImpl.checkEntitlementPresent(PolicyCheckerImpl.java:481)
	at [email protected]/org.elasticsearch.entitlement.runtime.policy.PolicyCheckerImpl.checkManageThreadsEntitlement(PolicyCheckerImpl.java:406)
	at [email protected]/org.elasticsearch.entitlement.runtime.policy.ElasticsearchEntitlementChecker.check$java_lang_Thread$setDaemon(ElasticsearchEntitlementChecker.java:2647)
	at java.base/java.lang.Thread.setDaemon(Thread.java)
	at java.desktop/sun.java2d.Disposer.<clinit>(Disposer.java:80)
	at java.desktop/javax.imageio.stream.MemoryCacheImageInputStream.<init>(MemoryCacheImageInputStream.java:76)
	at org.apache.pdfbox.filter.LZWFilter.doLZWDecode(LZWFilter.java:78)
	at org.apache.pdfbox.filter.LZWFilter.decode(LZWFilter.java:70)
	at org.apache.pdfbox.filter.Filter.decode(Filter.java:97)
	at org.apache.pdfbox.filter.Filter.decode(Filter.java:285)
	at org.apache.pdfbox.cos.COSStream.createView(COSStream.java:196)
	at org.apache.pdfbox.pdmodel.PDPage.getContentsForRandomAccess(PDPage.java:267)
	at org.apache.pdfbox.pdmodel.PDPage.getContentsForStreamParsing(PDPage.java:256)
	at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:59)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:534)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:515)
	at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158)
	at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:153)
	at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:379)
	at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:136)
	at org.apache.tika.parser.pdf.AbstractPDF2XHTML.processPages(AbstractPDF2XHTML.java:1362)
	at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:252)
	at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:107)
	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:219)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:204)
	at org.apache.tika.Tika.parseToString(Tika.java:525)
	at org.elasticsearch.ingest.attachment.TikaImpl.parse(TikaImpl.java:73)
	at org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:123)
	at [email protected]/org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:171)
	at [email protected]/org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:146)
	at [email protected]/org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:214)
	at [email protected]/org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:1125)
	at [email protected]/org.elasticsearch.ingest.IngestService.executePipeline(IngestService.java:1369)
	at [email protected]/org.elasticsearch.ingest.IngestService.executePipelines(IngestService.java:1199)
	at [email protected]/org.elasticsearch.ingest.IngestService$1.doRun(IngestService.java:1039)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
	at [email protected]/org.elasticsearch.ingest.IngestService.executeBulkRequest(IngestService.java:1046)
	at [email protected]/org.elasticsearch.action.bulk.TransportAbstractBulkAction.processBulkIndexIngestRequest(TransportAbstractBulkAction.java:303)
	at [email protected]/org.elasticsearch.action.bulk.TransportAbstractBulkAction.lambda$applyPipelines$3(TransportAbstractBulkAction.java:283)
	at [email protected]/org.elasticsearch.action.ActionListener.run(ActionListener.java:465)
	at [email protected]/org.elasticsearch.action.bulk.TransportAbstractBulkAction.applyPipelines(TransportAbstractBulkAction.java:273)
	at [email protected]/org.elasticsearch.action.bulk.TransportAbstractBulkAction.applyPipelinesAndDoInternalExecute(TransportAbstractBulkAction.java:444)
	at [email protected]/org.elasticsearch.action.bulk.TransportAbstractBulkAction$2.doRun(TransportAbstractBulkAction.java:190)
	at [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1067)
	at [email protected]/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
	at java.base/java.lang.Thread.run(Thread.java:1447)

@prdoyle prdoyle self-assigned this Sep 10, 2025
@prdoyle prdoyle added >non-issue auto-backport Automatically create backport pull requests when merged :Core/Infra/Entitlements Entitlements infrastructure v9.2.0 v8.19.5 labels Sep 10, 2025
@prdoyle prdoyle marked this pull request as ready for review September 10, 2025 14:21
@prdoyle prdoyle requested a review from a team as a code owner September 10, 2025 14:21
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Sep 10, 2025
@prdoyle prdoyle force-pushed the disposer-entitlements branch from 32fe444 to 9b33eb5 Compare September 10, 2025 14:55
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just some comments/questions

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job coding the package -> scope map for the excluded modules! I still sense some friction (like it should be simpler), but I think this is the best we can do currently. LGTM

@prdoyle prdoyle merged commit 9b41320 into elastic:main Sep 11, 2025
34 checks passed
prdoyle added a commit to prdoyle/elasticsearch that referenced this pull request Sep 11, 2025
* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19
9.1
8.18 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 134454

prdoyle added a commit to prdoyle/elasticsearch that referenced this pull request Sep 11, 2025
* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
elasticsearchmachine pushed a commit that referenced this pull request Sep 11, 2025
* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
elasticsearchmachine pushed a commit that referenced this pull request Sep 11, 2025
* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
prdoyle added a commit to prdoyle/elasticsearch that referenced this pull request Sep 11, 2025
* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
@prdoyle
Copy link
Contributor Author

prdoyle commented Sep 11, 2025

Backport stragglers are in #134559

elasticsearchmachine pushed a commit that referenced this pull request Sep 11, 2025
* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
@julian-elastic
Copy link
Contributor

@prdoyle @ldematte
This is causing a test failure on main. I am not sure why it is not happening on the build machines, but it is definitely happening on multiple people's Macs. To reproduce ./gradlew -p x-pack/plugin/esql/ test
results in 100+ failures.

I tried a simple fix to add more permissions, but it does not seem to resolve the issue. Can you please consider reverting this PR temporarily or work on a fix?

julian-elastic added a commit that referenced this pull request Sep 11, 2025
prdoyle added a commit to prdoyle/elasticsearch that referenced this pull request Sep 11, 2025
…ic#134559)

* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 11, 2025
…ic#134544)

* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
elasticsearchmachine pushed a commit that referenced this pull request Sep 11, 2025
…34580)

* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
@mosche
Copy link
Contributor

mosche commented Sep 12, 2025

I'll have a look and will fix asap @julian-elastic
I'm quite surprised to see a java.awt dependency in our code 🙈

@mosche
Copy link
Contributor

mosche commented Sep 12, 2025

Oh, I missed #134578

sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 19, 2025
…ic#134544)

* Grant manage_threads to java.desktop for Tika

* Produce a PolicyScope for excluded system modules.

Fortunately, the JDK is still modular even if the tests are run sans modules,
so we can use the module API to identify the packages in every system module.

* TODO with Jira link
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 23, 2025
BASE=41fea9d8a715b1e2ffb668c3cf54c6c9645f0331
HEAD=135d74b33ba9f1ce2b0e61fa5fb2a39a3763c688
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 2, 2025
BASE=41fea9d8a715b1e2ffb668c3cf54c6c9645f0331
HEAD=135d74b33ba9f1ce2b0e61fa5fb2a39a3763c688
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 7, 2025
BASE=41fea9d8a715b1e2ffb668c3cf54c6c9645f0331
HEAD=135d74b33ba9f1ce2b0e61fa5fb2a39a3763c688
Branch=main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Core/Infra/Entitlements Entitlements infrastructure >non-issue Team:Core/Infra Meta label for core/infra team v8.18.8 v8.19.5 v9.0.8 v9.1.5 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants